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Young" APE-type non-LTR ret- 
rotransposons (non-LTRs) typically 
encode two open reading frames (ORFs 
1 and 2). The shorter ORFl transla- 
tion product (ORFlp) comprises an 
RNA binding activity, thought to bind 
to non-LTR transcript RNA, protect 
against nuclease degradation and specify 
nuclear import of the ribonuclear protein 
complex (RNP). ORF2 encodes a mul- 
tifunctional protein (ORF2p) compris- 
ing apurinic/apyrimidinic endonuclease 
(APE) and reverse-transcriptase (RT) 
activities, responsible for genome repli- 
cation and re-integration into chromo- 
somal DNA. However, some clades of 
APE-type non-LTRs only encode a single 
ORE — corresponding to the multifunc- 
tional ORF2p outlined above (and for 
simplicity referred-to as ORF2 below). 
The absence of an ORFl correlates with 
the acquisition of a 2A oligopeptide 
translational recoding element (some 
18—30 amino acids) into the N-terminal 
region of ORF2p. In the case of non- 
LTRs encoding two ORFs, the presence 
of ORFl would necessarily downregu- 
late the translation of ORF2. We argue 
that in the absence of an ORFl, 2 A could 
provide the corresponding translational 
downregulation of ORF2. While mul- 
tiple molecules of ORFlp are required to 
decorate the non-LTR transcript RNA in 
the cytoplasm, conceivably only a single 
molecule of ORF2p is required for target- 
primed reverse transcription/integration 
in the nucleus. Why would the transla- 
tion of ORF2 need to be controlled by 
such mechanisms? An "excess" of ORF2p 
could result in disadvantageous levels 



of genome instability by, for example, 
enhancing short, interspersed, element 
(SINE) retrotransposition and the gen- 
eration of processed pseudogenes. If so, 
the acquisition of mechanisms — such as 
2A — to control ORF2p biogenesis would 
be advantageous. 

The discovery and characterization of 
2A oligopeptide translational recoding 
sequences within the genomes of non- 
LTR retrotransposons (non-LTRs) reveals 
another interesting parallel between 
the molecular biology of non-LTRs and 
viruses. Initially, we thought the occur- 
rence of these short 2A/2A-like sequences 
(2As) was confined to RNA virus 
genomes, but 2A sequences were discov- 
ered in Z7 7c non-LTRs within the genome 
of trypanosome species.' From this single 
occurrence, it was not possible to gauge 
either the significance of this observation 
for the molecular biology, or, the evolu- 
tion of non-LTRs. Recently, however, 2 As 
have been characterized from a range of 
different types of non-LTR and, perhaps 
of equal significance, from the genomes of 
a wide range of species.^ These new data 
add support to the notion that acquisition 
of 2As is of functional significance and, 
indeed, may represent a significant step in 
the generation of a sub-group of APE-type 
non-LTR retrotransposons with a different 
method of controlling protein biogenesis. 

What Are 2A Translational 
Recoding Sequences? 

A group of oligopeptide sequences 
collectively known as 2A mediate a 
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Figure 1. Sequences encoding protein domains (or complete genes) can be concatenated via 2As to create a single ORF.*""' The mRNA encodes a single 
ORF, translation potentially giving rise to three, alternative, products — depending upon the recoding activity of the 2A sequence in question. Outcome 
(i) corresponds to the production of two, individual, translation products — protein 1 with a C-terminal extension of 2A plus protein 2. Outcome (ii) 
arises from termination of translation at the C-terminus of 2 A. Outcome (iii) produces a full-length translation product (A). The genomic organization of 
the picornavirus foot-and-mouth disease virus is shown (-8,500 nts). The 5' terminus of the vRNA is covalently bound to an oligopeptide (VPg), rather 
than a 7meG mRNA cap structure. The long 5'NCR comprises an internal ribosome entry sequence (IRES) which initiates translation of the long ORF 
in a cap-independent manner. The capsid proteins polyprotein domain is separated from the replication protein domains by the recoding activity of 
2A. The structure of the LlTc transcript RNA is shown with the position of 2A in the N-terminal region of the ORF. For comparison, the equivalent RNA 
organisations are shown for the "ancient" type of non-LTR (a single ORF comprising RTand RELdomains), together with the canonical "young" non-LTRS 
encoding 2 ORFs, in which the REL domain in 0RF2 is supplanted by the APE domain (B). 



translational recoding event known as 
"ribosome skipping," "StopGo" or "Stop 
Carry-on" translation.''' Briefly, the model 
of this non-canonical form of translation 
proposes that when a ribosome translates 
an mRNA sequence encoding 2A, the 
nascent 2A oligopeptide interacts with 
the exit tunnel of the ribosome (through 
which the elongating polypeptide prod- 
uct leaves the structure) and "stalls" the 



progress of the ribosome. Although a 
stop codon has not been encountered, 
the nascent peptide is released (forming 
the C-terminus of 2A), but then trans- 
lation may resume — synthesizing the 
polypeptide sequence downstream of 2A: 
the synthesis of the peptide bond at this 
specific point in the protein backbone is 
"skipped." For the vast majority of read- 
ers who are not familiar with translational 



(as opposed to transcriptional) control of 
protein biogenesis, there are a number of 
essential points which should be noted — 
with regards the mechanism of 2As and 
their potential function in the molecular 
biology of non-LTRs. The first is that 
although encoded as a single ORF, pro- 
teins comprising one, or more, 2As are not 
produced as a full-length translation prod- 
uct, but synthesized as (shorter) multiple, 
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discrete, products (Fig. lA) — quite dis- 
tinct from the ORF being translated as a 
single polypeptide which is subsequently 
"processed" into smaller products. The 
second point is that there is an alternative 
outcome: rather than resumption (pseudo- 
reinitiation) of the translation of mRNA 
sequences downstream of 2A, translation 
may terminate at the C-terminus of 2A 
(Fig. lA). We have proposed that trans- 
lational (cellular) stress promotes termina- 
tion rather than pseudo-reinitiation and 
that 2A may act to regulate the synthesis 
of downstream sequences, determining 
the ratio of translation products up- and 
down-stream of 2A. The third point is 
that for some 2As, the activity outlined 
above is not complete. Depending upon 
the nature of the 2A sequence in question, 
the interaction of 2A with the ribosome 
exit tunnel is weaker: a proportion of the 
ribosomes do not pause at the C-terminus 
of 2A and the peptide bond is formed in 
the normal manner (Fig. lA). Lastly, the 
study of 2A and its extensive use in bio- 
technological and biomedical applications 
shows that these sequences work in all 
eukaryotic cell-types tested to date, but 
not in bacteria.^'''"''^ 

"2A" derives from the systematic 
nomenclature of protein domains within 
the polyprotein encoded by the single 
ORF of a group of positive-stranded (+ve 
strand; mRNA sense) RNA viruses — the 
family Picornaviridae (Fig. IB). Much 
of the early characterization of 2A arose 
from the study of the picornavirus foot- 
and-mouth disease virus (FMDV; genus 
Aphthovirus) . As sequence databases 
expanded it became apparent that 2A-like 
sequences were present in the genomes of 
many other genera of the Picornaviridae, 
and, in other families of +ve and double- 
stranded RNA viruses.'" Interestingly, 
2 As were also detected within LlTc non- 
LTR retrotransposons (non-LTRs) in the 
genomes of Trypanosoma spp} Naturally, 
similarity with such a short sequence 
could occur by chance, but analyses of 
40 LlTc sequences (representatives of 
over 100 such elements detected in the 
genome) showed all contained a 2A-like 
sequence in the same N-terminal region 
of the long ORF encoding the apurinic/ 
apyrimidinic DNA endonuclease (APE) 
and reverse-transcriptase (RT) domains 



(Fig. IB). Furthermore, mutations were 
detected within a motif conserved at the 
C-terminus of 2A: -DxExNPG"''?- (verti- 
cal arrow indicating the site of the recod- 
ing event). This motif is very important 
for recoding activity, and the mutations 
observed among the different LlTc non- 
LTRs correlated with changes in transla- 
tional recoding activity.' 

A single ancestral form of LlTc could 
have encoded this 2A prior to prolifera- 
tion of these elements in the genome of 
Trypansosma cruzi, with the subsequent 
accumulation of mutations. As more 
genome sequences became available, 
however, it became clear that 2As were 
encoded by non-LTRs within (i) a num- 
ber of different clades of non-LTRs and 
(ii) genomes of a wide range of different 
species. The notion that these 2As have a 
functional significance in the biology of 
these elements is supported by their occur- 
rence in the same position within ORF2p. 
The genomes of the "ancient" non-LTRs 
encode a single ORF comprising a multi- 
functional protein with reverse transcrip- 
tase (RT) and restriction enzyme-like 
endonuclease (REL-endo) domains: here, 
re-integration is sequence-specific. The 
REL-endo domain is lost in "young" 
non-LTRs and replaced with an APE 
domain, integration now being sequence 
independent (Fig. IB; reviewed in refs 
13-15). These young non-LTRs have also 
acquired another ORF (ORFl) upstream 
of the long ORF (ORF2, comprising the 
APE and RT domains: Fig. IB), with 
ORF2 of some elements also encoding 
RNaseH domains. Our bioinformatic 
analyses indicated a correlation between 
young non-LTRs encoding 2A and the 
apparent loss of ORFl. It should be noted, 
however, that the low processivity of RT is 
thought to be responsible for the frequent 
5' truncation of non-LTR genomes during 
retrotransposition — which may be a factor 
in our bioinformatic analyses.'^ 

The Functions and Biogenesis 
of 0RF1p 

ORFl encodes a protein (ORFlp) 
with low sequence similarity among dif- 
ferent non-LTRs. Functional studies have 
shown that ORFlp is (i) a high-affinity 
RNA-binding protein which forms a 



ribonucleoprotein (RNP) particle together 
with non-LTR transcript RNA, (ii) that 
ORFlp contains signals required for the 
nuclear import of this RNP complex, 
(iii) there is stringent cw-requirement for 
ORFlp during LI retrotransposition, (iv) 
ORFlp was shown to possess nucleic acid 
chaperone activity in the case of a long, 
interspersed, element (LINE) -like trans- 
posable element in Drosophila (the / factor) 
and (v) deaminase-independent restriction 
of LI by APOBEC3C requires an RNA- 
dependent interaction between human LI 
ORFlp and APOBEC3C dimers.'^"'" 

Our analyses showed the majority of 
non-LTRs encoding a 2A-like sequence 
did not possess an ORFl (comprising the 
functions outlined above), but that non- 
LTRs sequences encoding 2A, and lack- 
ing ORFl, clustered alongside elements 
from other species which did encode an 
ORFl— but not a 2A.^ Indeed, all ele- 
ments within the Lngi clade {Lngi, Tcoingi, 
Tvingi, and LlTc) do not appear to encode 
an ORFl and the related Vingi elements 
only encode a single ORF.''^'"'^' It should 
also be noted, however, that "ancient" 
non-LTRs do not possess an ORFl and 
the retrotransposition of SINEs is medi- 
ated by LINEs in trans. It has been shown 
that although LI ORFlp is not required 
for Alu SINE retrotransposition, this 
process is enhanced by supplementation 
with LI ORFlp. By analogy, it seems 
plausible that ORFlp functions could be 
supplied in trans from those non-LTRs 
in the genome encoding and expressing 
ORFlp, to form a functional RNP from 
those non-LTR transcript RNAs only 
encoding ORF2: ORF2p being supplied 
in cis. The few non-LTRs we identified 
which encode both ORFl and a 2A-like 
sequence within ORF2 (e.g., CR1-26_BF, 
CR1-53_BF, and CR1-1_LG) may repre- 
sent an intermediate evolutionary stage in 
the loss of ORF 1.2 

The non-LTR RNA transcript com- 
prises an atypical RNA polymerase II (or 
pol III) promoter in the 5' non-coding 
region (NCR) such that transposition of 
a full-length non-LTR element does not 
require chance integration adjacent to 
a Pol II promoter (reviewed in refs. 23 
and 24). Eukaryotic translation requires 
the assembly of the cap-binding protein 
complex at the 5' cap structure followed 
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by "scanning" of the 5'NCR until an 
initiating AUG start codon in a suitable 
Kozak consensus sequence is encountered. 
There must be, therefore, a high selective 
pressure upon the 5'NCR of non-LTRs 
against the occurrence of AUGs — at least 
in the context of a favorable Kozak con- 
sensus sequence: ORFl can be translated 
in the "normal" (cap-dependent) manner. 
Once the ORFl stop codon is encoun- 
tered, translation terminates. The ques- 
tion arises, therefore, how is the translation 
ofORF2 initiated?. 

The Initiation of Translation of 
Non-LTR ORFs 

Positive-stranded RNA viruses such as 
picornaviruses and non-LTR transcript 
RNAs face a common problem: how to 
generate multiple functions from a single 
mRNA. In the case of picornaviruses, a 
single ORF encodes a (multifunctional) 
polyprotein, although the full-length 
translation product is not observed within 
infected cells due to the action of virus- 
encoded proteinases and the 2A-mediated 
ribosome skipping mechanism (reviewed 
in ref 25) Here, polyproteins are "pro- 
cessed" into functional domains in a 
series of co- and post-translational cleav- 
age events. In the case of APE-type non- 
LTRs, the strategy is to encode two ORFs, 
both translation products are multifunc- 
tional, but neither are processed. 

In picornaviruses the long 5'NCR 
contains multiple AUGs, but translation 
of the single long ORF is initiated not in 
the canonical mRNA ^"''G-cap-dependent 
manner, but by a non-canonical mecha- 
nism. The picornavirus 5'NCR comprises 
an RNA secondary structural feature — 
an Internal Ribosome Entry Sequence 
(IRES) — which mediates initiation in a 
cap-independent mechanism: ribosomes 
do not scan through the 5'NCR, but are 
"delivered" directly to the correct AUG. 
Along with ribosome "shunting," leaky 
scanning, initiation at non-AUGs, and 
re-initiation, viruses employ a range of 
different mechanisms to initiate transla- 
tion (reviewed in ref. 26). In essence, 2A 
mediates a "pseudo-reinitiation," although 
in this special case only for ribosomes 
already engaged in the elongation cycle of 
translation. 



How translation initiates for ORF2 
remains an interesting — and impor- 
tant — question. In the case of the 
SARTl element an overlapping stop-start 
codon i-UA AUG -) links ORFs 1 and 2. 
Interestingly, an RNA secondary struc- 
ture downstream of this site has been 
shown to affect the efficiency of the ini- 
tiation of translation of ORF2.~^ In this 
specific case there is no intergenic region 
separating ORFs 1 and 2. For the vast 
majority of non-LTRs, however, an inter- 
genic sequence is present, and in some 
cases appears to be hundreds of bases 
long. Following termination of translation 
of upstream ORFs (uORFs), it is known 
that certain post-termination events can 
lead to re-initiation of translation of a 
downstream ORF. The efficiency of this 
re-initiation is dependent upon the length 
of the uORF (time taken to translate) 
and the length of the intergenic region 
(reviewed in ref 28). By these criteria, is 
seems highly improbable that this mech- 
anism can be at play in ORF2p biogen- 
esis. In most cases non-LTR intergenic 
regions are long enough to comprise an 
IRES. Although first characterized in 
picornaviruses, IRESes are also found in 
the 5'NCR of certain cellular genes (e.g., 
c-Myc, Apaf-1, Bcl-2, XIAP, DAP5). It is 
thought that their expression (translation) 
is controlled by these elements — particu- 
larly under conditions of cellular stress, or, 
in response to specific stimuli (reviewed in 
refs. 29-31). In the insect dicistroviruses 
the intergenic region internal ribosome 
entry site (IGR IRES) can directly assem- 
ble SOS ribosomes and initiate translation 
at a non-AUG codon from the ribosomal 
A-site. These activities arise from two 
independently folded domains of the IGR 
IRES. The first domain, composed of two 
overlapping RNA pseudoknots (PKII/ 
III), mediates recruitment of the ribosome 
while the second domain, composed of 
a single RNA pseudoknot (PKI), mim- 
ics a tRNA anticodon— codon interaction 
thereby positioning the non-AUG codon 
at the ribosomal A-site.'^ If such IGR 
IRESes were present in the genomes of 
non-LTRs, then the notion of what com- 
prises ORF2 — and bioinformatic analy- 
ses — would need to be revisited. 

Whichever of the mechanisms men- 
tioned above mediates the translation 



of ORF2, the translation of the second 
ORF will be much less efficient than 
the first. Presumably, it takes a substan- 
tially greater quantity of ORFlp to form 
an RNP complex than of the associ- 
ated ORF2p required to perform target- 
primed reverse transcription/integration. 
This is consistent with the genomic orga- 
nization of non-LTRs and the report that 
even though relatively high levels of LlTc 
mRNAs are detected within trypanosome 
cells, only low levels of ORF2p protein 
could be detected.' In this case, however, 
LlTc does not encode an ORFl and it 
may be that if translational "downregu- 
lation" of ORF2 is required, that this is 
now being achieved not by the presence of 
a uORF (ORFl), but by the 2A transla- 
tional recoding element in the N-terminal 
region of the ORF. 

Concluding Remarks 

The study of the molecular biology of 
viruses is greatly simplified by the ability 
to infect virtually all susceptible tissue- 
culture cells to produce a strong, synchro- 
nous, "signal" from the object of study. 
Sequencing of virus genomes is straight- 
forward and, generally, the sequence rep- 
resents a biologically active entity. Due 
to the biology of non-LTRs, the equiva- 
lent experimental analyses are very much 
more complex, or, simply impossible. In 
2002 a biologically active genome of the 
picornavirus, poliovirus, was synthesized 
from synthetic oligonucleotides. Using 
the same approach, this was soon followed 
by the creation of a synthetic mouse LI 
non-LTR genome {ORFeus-Mm) and sub- 
sequently the human LI counterpart — 
ORFeus-Hs}'''^'* Synthetic biology was used 
to optimise codon usage within ORFs 1 
and 2 in a step-wise, incremental, manner. 
Only when the synthetic, optimised, por- 
tion of the genome was extended to com- 
prise the entire ORFl and the N-terminal 
two-thirds of ORF2 was LI expression 
and retrotransposition frequency mark- 
edly enhanced, consistent with the notion 
that control over ORF2 protein biogenesis 
is key in downregulating the activities of 
these elements. This strategy also allows 
manipulation of non-LTR genomes at 
will, and can be used to resolve many of 
the outstanding questions surrounding 
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the molecular biology of this fascinating 
group of mobile genetic elements: the 
non-LTR sequence databases should pro- 
vide a rich resource for synthetic/molecu- 
lar biologists to exploit. Alexander Pope 
wrote that "The proper study of Mankind 
is Man": these exciting new tools can now 
be used to study the -17% of the human 
genome that are LINEs and the many dis- 
eases which arise from retrotransposition 
in humans (reviewed in refs. 35 and 36). 

The reader may have noted that the 
interesting questions as to "the how" and 
the "from where" these 2A sequences have 
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